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Abstract 

Based on measurements of the Internet topology data, we found out that there are two mech- 
anisms which are necessary for the correct modeling of the Internet topology at the Autonomous 
Systems (AS) level: the Interactive Growth of new nodes and new internal links, and a nonlinear 
preferential attachment, where the preference probability is described by a positive-feedback mech- 
anism. Based on the above mechanisms, we introduce the Positive-Feedback Preference (PFP) 
model which accurately reproduces many topological properties of the AS-level Internet, includ- 
ing: degree distribution, rich-club connectivity, the maximum degree, shortest path length, short 
cycles, disassortative mixing and betweenness centrality. The PFP model is a phenomenological 
model which provides a novel insight into the evolutionary dynamics of real complex networks. 

PACS numbers: 89.75.-k, 87.23. Ge, 05.70.Ln 
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I. INTRODUCTION 



Recently there has been a considerable effort to understand the topology of complex 
networks Of particular interest are complex networks obtained from 

evolving mechanisms, like the Internet or the World Wide Web, as they are so influential 
in our daily life. The degree k of a node, is the number of links which have the node as 
an end-point, or equivalently, the number of nearest neighbors of the node. The statistical 
distribution of the degree P{k), gives important information of the global properties of a 
network and can be used to characterize different network topologies. The Internet has been 



studied in detail [?], 0, 0, Q, [ill Q, 13] since the measured data 
available. Now, it is well known that the Internet can be represented as a Scale-Free (SF) 
network, where the degree distribution is a power law P(k) ~ k" 1 . The exponent 7 of the 
Internet at the Autonomous Systems (AS) level is approximately 2.22 (see Fig.QJand Fig. [21). 
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FIG. 1: Degree distribution. The AS-level Internet topology data used in this research is a 
traceroute-derived AS graph measured in April 2002 

Barabasi and Albert (BA) ^| showed that it is possible to grow a network with a power- 
law degree distribution by using a preferential-growth mechanism: starting with a small 
random network, the system grows by attaching a new node with m links to m different 
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FIG. 2: The cumulative degree distribution P cmm (k) of the AS graph decays as P C mm{k) ~ 



-1.22 



hence the degree distribution ~ 7 with exponent 7 ~ 2.22 



0. 



"old" nodes that are already present in the system (m — 3 to obtain Internet-like networks); 
the attachment is preferential because the probability that a new node will connect to node 
i, with degree hi, is 

n(z) = JV. (i) 



The BA model generates networks with the power-law exponent 7 = 3 |20 L 

Based on the BA model, a number of evolving network models have been 

introduced to obtain degree distributions with other power-law exponents. Some of these new 
models have been used to model the Internet. However, a network model based solely on the 
reproduction of the power-law exponent of the degree distribution has its limitations, as it 
will not describe the Internet hierarchical structure [8||. In the next section we investigate two 
properties of the Internet which were not accurately modeled by the existing models, namely 
the rich-club co n ne ct ,v, ty Q a^g M gMegr ee node, m d the _ de gr ee „ t the 
network. The accurate modeling of these two properties was our motivation for developing 
a new network model. In section 3 we introduce the Positive Feedback Preferential (PFP) 
model, which is a phenomenological model of the AS-level Internet topology. Section 4 
presents the validation of the model and in section 5 are the conclusions of this work. 



II. CHALLENGES IN ACCURATE MODELING OF THE INTERNET 



The Rich-Club 



Scale-free networks can be grouped into assortative, disassortative and neutral networks 



22, 



24j . Social networks (e.g. the co-authorship network) are assortative networks, in 
which high-degree nodes prefer to attach to other high-degree nodes. Information networks 
(e.g. the World Wide Web and the Internet) and biological networks (e.g. protein interaction 
networks) have been classified as disassortative networks, in which high-degree nodes tend 
to connect with low-degree ones. 





(a) (b) 

FIG. 3: Two disassortative networks, (a) High-degree nodes are loosely interconnected, (b) High- 
degree nodes are tightly interconnected. 



While the AS-level Internet is disassortative [lQL Illj. this property does not imply that the 
high-degree nodes are tightly interconnected to each other (see Fig- EI) - One of the structural 
properties of the AS-level Internet is that it contains a small number of high-degree nodes. 
We call these nodes, "rich" nodes, and the set containing them, the "rich-club" . The inter- 
connectivity among the club members is quantified by the rich-club connectivity [2]] which 
is defined as follows. The rank r of a node denotes its position on a list of all nodes sorted 
in decreasing degree. If the network has N nodes then r G [l,iV]. If the rich-club consists 
of the first r nodes in the rank list, then the rich-club connectivity <p(r/N) is defined as 
the ratio of the number of links connecting the club members over the maximum number 
of allowable links, r(r — l)/2. The rich-club connectivity measures how well club members 
"know" each other. A rich-club connectivity of 1 means that all the members have a direct 
link to any other member, i.e. they form a fully connected subgraph. 

Fig.0]shows the rich-club connectivity as a function of the rank normalized by the number 
of nodes. It is clear that in the AS graph the high-degree nodes are tightly interconnected. 
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FIG. 4: Rich-club connectivity <fi(r/N) vs normalized rank r/N . The top 1% best-connected nodes 
are marked with the vertical hash line. 

The top 1% best-connected nodes of the AS graph have 27% of the possible interconnections, 
compared with only 4.5% obtained from a network topology generated using the BA model 
which has the same number of nodes and slightly larger number of links as the AS graph 
(see table I). 

The rich-club consists of highly connected nodes, which are well interconnected between 
each other and the average hop distance among the club members is very small (1 to 2 hops). 
The rich-club is a "super" traffic hub of the network and the disassortative mixing property 
ensures that peripheral nodes are always near the hub. These two structural properties 
together contribute to the routing efficiency of the network. An Internet model that does not 
reproduces the properties of the rich-club will underestimate the actual network's routing 
efficiency (shortest path length) and routing flexibility (alternative reachable paths), and 
also, it will overestimate the network robustness under node-attack j^j. 



The Interactive Growth Model 



The BA model is based solely on the attachment of new nodes. However the appearance of 



new internal links among old nodes has also been observed in the evolution of the Internet 
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TABLE I: Network Parameters 







AS graph 


PFP model 


IG model 


BA model 


Number of nodes 


N 


11122 


11122 


11122 


11122 


Number of links 


L 


30054 


30151 


33349 


33349 


Average degree 


(k) 


5.4 


5.4 


6.0 


6.0 


Exponent of power-law 


1 


2.22 


2.22 


2.22 


3 


Rich-club connectivity 


<j,(r/N = 0.01) 


0.27 


0.30 


0.32 


0.045 


Max. degree 


fcmax 


2839 


2785 


700 


292 


Degree distribution 


p(k = l) 


26% 


28% 


26% 


0% 


Degree distribution 


P(k = 2) 


38% 


36% 


34% 


0% 


Degree distribution 


P(k = 3) 


14% 


12% 


11% 


40% 


Characteristic path length 


I* 


3.13 


3.14 


3.6 


4.3 


Average triangle coef. 


(h) 


12.7 


12 


10.4 


0.1 


Max. triangle coef. 


fct—max 


7482 


8611 


4123 


64 


Average quadrangle coef. 


(k q ) 


277 


247 


105.4 


1.3 


Max. quadrangle coef. 


kq—max 


9648 


9431 


8780 


527 


Average k nn 


{knn) 


660 


482 


103 


20 


Average betweenness 


(C*b) 


4.13 


4.14 


4.6 


5.3 


Max. betweenness 


B—max 


3237 


3419 


1002 


1064 



111 ]. During the last few years, researchers have proposed a number of Internet models using 
the appearance of new internal links, such as Dorogovtsev and Mendes' model 26] , Bu and 
Towsley's Generalized Linear Preference (GLP) model j^, Bianconi et al 's Generalized 
Network Growth (GNG) model 2sj], Caldarelli et al 's model j3] and the Interactive Growth 



(IG) model [30]. In addition to the appearance of new internal links, these models have also 
used different preference schemes to capture selected properties of the Internet. 

Here we revisit the Interactive Growth (IG) model as it is the precursor of the Positive 
Feedback Preference model and the IG model provides a possible way to reproduce both the 
power-law degree distribution and the rich-club connectivity of the AS graph. The IG model 
generates a network using the Interactive Growth, where new internal links start from the 
host nodes, which are the old nodes that new nodes are attached to. The IG model starts 
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with a small random network, at each time step, 

• with probability p 6 (0,1), a new node is attached to one host node and two new 
internal links appear between the host node and two other old nodes (peer nodes); 

• with probability 1 — p, a new node is attached to two host nodes and one new internal 
link appears between one of the host nodes and a peer node. 

In the actual Internet, new nodes bring new traffic load to its host nodes. This results in 
both the increase of traffic volume and the change of traffic pattern around host nodes and 
triggers the addition of new links connecting host nodes to peer nodes in order to balance 
network traffic and optimize network performance. From numerical simulations, we found 
that whenp = 0.4 the Interactive Growth also satisfies the following two characteristics 



observed 



10 



11 



12] in the Internet measurements. Firstly, the majority of new nodes 
are added to the system by attaching them to one or two old nodes (m < 2). Secondly 
the degree distribution of the AS graph is not a strict power-law as it has more nodes with 
degree two than nodes with degree one (P(2) = 38% > P(l) = 26%, see Table HJ). The 
IG model uses the BA model's linear preference of Eq. (jj) in the attachment of new nodes 
and the appearance of new internal links. As shown in Fig.^ Fig.|2l Fig.0] and Table HI 
the IG model closely resembles both the power-law degree distribution and the rich-club 
connectivity of the AS graph. 



B. Maximum Degree 

The IG model still has its limitations. The maximum node degree k max present in the AS 
graph is nearly a quarter of the number of nodes (k max ~ N/4) and is significantly larger 
than the maximum degree obtained by the IG and BA models using linear preferential 
attachment (see Table HI). To overcome this shortfall, it is possible to favor high-degree 

nn 

nodes by using the nonlinear preferential probability 26S, |3Jj 

n« = ^, «>i. (2) 

To examine the above nonlinear preference, here we study a so-called Test* model, which 
is a modification of the IG model. The Test* model uses the same Interactive Growth 
mechanism as the IG model, but it does not use the linear preference given by Eq. 




FIG. 5: Node degree k vs rank r. 

instead it uses the nonlinear preference given by Eq. (J2J). Numerical experiments showed 
that, when at = 1.15 ± 0.01, the Test* model generates networks with the maximum degree 
similar to the AS graph. However, as shown in Fig. HJ the rich-club connectivity produced by 
the Test* model deviates from the AS graph. For example, the 1% best connected nodes of 
the Test* model have 42% allowable interconnections compared with 27% of the AS graph. 



III. POSITIVE-FEEDBACK PREFERENCE MODEL 
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Based on the Internet-history data, Past or- Sat or r as et al Q| and Vazquez et al 
measured that the probability that a new node links with a low-degree old node follows the 
linear preferential attachment given by Eq. Whereas Chen et al |J reported that high- 
degree nodes have a stronger ability of acquiring new links than predicted by Eq. (0). The 
Internet-history data also show that at early times, the degree of node increases very slowly; 
later on, the degree grows more and more rapidly. Taking into account these observations, 
we modified the IG model by using the nonlinear preferential attachment 

, l+(51og 10 ki 

8 



We call this the Positive- Feedback Preference (PFP) model. From numerical simulations, we 
found that 5 = 0.048 produces the best result. (It is interesting to notice that for 5 = 0.048 
and the maximum degree k max = 2839 as measured on the AS graph, the exponent function 
of 1 + 51og 10 k max ~ 1.166, which is close to the value of a used in the Test* model). 

We also refine the Interactive Growth mechanism. The PFP model starts with a small 
random network, at each time step, 

• with probability p 6 [0,1], a new node is attached to one host node; and at the same 
time one new internal link appears between the host node and a peer node; 

• with probability q 6 [0, 1 — p], a new node is attached to one host node; and at the 
same time two new internal links appear between the host node and two peer nodes; 

• with probability 1 — p — q, a new node is attached to two host nodes; and at the same 
time one new internal link appears between one of the host nodes and one peer node; 

When p = 0.3 and q = 0.1, the generated PFP network has the same ratio of nodes to links 
as in the AS graph (see Table HJ). Eq. (JHJ is used in choosing host nodes and peer nodes. 
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FIG. 6: Three degree functions: k, k a with a 
1.15 and k 1+slo Zw k with 5 = 0.048. 
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The PFP model satisfies Pastor-Sartorras et al, Vazquez et al and Chen et al 's obser- 
vations. For low-degree nodes, the preferential attachment is approximated by Eq. For 
high-degree nodes, the preferential attachment increases as a nonlinear function of the node 
degree (see Fig. |BJ). Hence, as the time passes by, the rate of degree growth in the PFP 
model is faster than in the IG model and the BA model (see Fig. |7j). 



IV. MODEL VALIDATION 

The validation was done by comparing the AS graph with networks generated by the 
PFP model, the IG model and the BA model. For each model, ten different networks were 
generated and averaged. The networks had the same number of nodes and similar numbers 
of links as the AS graph (see Table |TJ) . 

A. Degree Distribution, Rich-Club Connectivity and Maximum Degree 

The PFP model produces networks that closely matches the degree distribution (see Fig.^ 
and Fig.|2J), the rich-club connectivity (see Fig. 0} and the maximum degree (see Table I) of 
the AS graph. Also the networks generated using the PFP model have the same power-law 
relationship between degree and rank, k ~ r -°- 85 as the AS graph (see Fig. EJ). In certain 
respect the accuracy of the PFP model to reproduce these properties is not a surprise. After 
all, the model was designed to match these properties. 

B. Shortest-Path Length 
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Shortest-path length 
FIG. 8: Cumulative distribution of average shortest-path length. 
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FIG. 9: Correlation between average shortest-path length I and degree, where I is the average over 
nodes with the same degree. 

The average shortest-path length I, of a node is denned as the average of the shortest- 
paths from the node to all other nodes in the network. Fig.|S] and Fig-EI show that the 
PFP model reproduces the cumulative distribution of average shortest-path length and the 
correlation between average shortest-path length and degree of the AS graph. 

The characteristic path length I*, of a network is the average of the shortest-paths over all 
pairs of nodes. The characteristic path length indicates the network overall routing efficiency. 
The AS graph is a small-world network |22| because the characteristic path length is very 
small compared with the network size. Table H] shows that the AS graph and the networks 
obtained from the PFP model have nearly the same characteristic path length. 



AS graph 
PFP model 

IG model 
BA model 




C. Short Cycles 

Cycles 



28, 



331 ] encode the redundant information in the network structure. The number 
of short cycles (triangles and quadrangles) is a relevant property because the multiplicity 
of paths between any two nodes increases with the density of short cycles (note that an 
alternative path between two nodes can be longer than their shortest-path). The trian- 
gle coefficient k t , is defined as the number of triangles that a node shares. Similarly the 
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quadrangle coefficient k q , is the number of quadrangles that a node has. 

Table U shows the AS graph and the networks generated using the PFP model have higher 
densities of short cycles {(k t ) and (k q )) than networks generated using the IG model and the 
BA model. Fig. ^] and Fig.lTTI show that the AS graph and the networks obtained from the 
PFP model have similar cumulative distributions of short cycles. Fig.^J and Fig. EI] show 
that the PFP networks exhibit similar correlations between short cycles and degree as in 
the AS graph. 

Notice that the clustering coefficient c of a node can be expressed as a function of the 
node's degree k and triangle coefficient k t , 

c= *« (4) 

The reason we study short cycles instead of clustering coefficient is that short cycles have 
the advantage of providing neighbor clustering information of nodes with different degrees. 



D. Disassortative Mixing 

The Internet exhibits the disassortative mixing behavior jlO . 
high-degree nodes tend to connect to peripheral nodes with low degrees. A network's mixing 
pattern is identified by the conditional probability p c (k'\k) that a link connects a node with 
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23, 



24 1 , where on average, 
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III by 



degree k to a node with degree k! . This conditional probability can be indicated 
k nn , the nearest-neighbors average degree of a node with degree k. 

Fig.dand Table HI show that on average the nearest-neighbors average degree of a node 
in the AS graph and the PFP networks is significantly larger than that in the IG and BA 
networks. Fig.EH shows that the PFP model closely reproduces the negative correlation 
between nearest-neighbors average degree and node degree of the AS graph and therefore 
exhibits similar disassortative mixing as the AS graph. 



E. Betweenness Centrality 

On a network, there are nodes that are more prominent because they are highly used 
when transferring information. A way to measure this "importance" is by using the concept 
of node betweenness centrality which is defined as follows. Given a source node s and a 
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destination node d, the number of different shortest-paths from s to d is g(s, d). The number 
of shortest-paths that contain the node w is g(w;s,d). The proportion of shortest-paths, 
from s to d, which contain node w is p s ,d(w) = g(w; s, d)/g(s, d). The betweenness centrality 



of node w is defined (34 , 



351 ] as 



C B (w) = ^2^2p s ,d(w), (5) 

s d^s 

where the sum is over all possible pairs of nodes with s ^ d. The betweenness centrality 
measures the proportion of shortest paths which visit a certain node. If all pairs of nodes of 
a network communicate at the same rate, the betweenness centrality estimates the node's 



capacity needed for a free-flow status 



34j . A node with a large Cb is "important" because 



it carries a large traffic load. If this node fails or gets congested, the consequences to 
the network traffic can be drastic 3^. Here the betweenness centrality is normalized by the 
number of nodes and denoted as C* B . The average of the (normalized) betweenness centrality 
in a network (C B ) = I* + 1 35], where /* is the network's characteristic path length. 

Fig. Uni shows that the cumulative distribution of betweenness centrality -P c «m(C^) of the 
networks exhibit similar power-law behaviors characterized by slope —1.1, hence P{C* B ) ~ 
{C* B )~ 2A jiol . However as shown in Table HI the maximum value of the betweenness 
centrality, C B _ x , for the AS graph and the PFP model are significantly larger than that 
for the IG model and the BA model. Fig.lTTI shows that only the PFP model closely matches 
the correlation between betweenness centrality and degree of the AS graph. 



V. CONCLUSIONS AND DISCUSSION 



In summary, the PFP model accurately reproduces many of the topological properties 
measured in the Internet at the AS level. The model is based on two growth mechanisms 
which are the nonlinear positive-feedback preferential attachment combined with the Inter- 
active Growth of new nodes and new internal links. Both the mechanisms are based on (and 
supported by) the observations on the Internet history data. 

The positive-feedback preference means that, as a node acquires new links, the node's 
relative advantage when competing for more links increases as a non-linear feed-back loop. 
This implies the inequality in the link-acquiring ability between rich nodes and non-rich 
nodes increases as the network evolves. Rich nodes, not only become richer, they become 

13 



disproportionately richer. While our initial motivation was to create a model that can 
accurately reproduce the rich-club connectivity and the maximum degree of the AS graph, 
the PFP model actually captures other properties as well. Further studies are needed to 
explain why the Internet growth seems to follow the non-linear preferential attachment given 
by the PFP model and what are the consequences of this growth mechanism for the future 
of the Internet. This research provides an insight into the basic mechanisms that could be 
responsible for the evolving topology of complex networks. 

Finally, the validation of the model was not conducted with measurement data based 
on the BGP-tables, but with the traceroute-derived AS graph, which is regarded as a more 
realistic and reliable measurement of the Internet 3f|. 
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FIG. 10: Cumulative distribution of triangle coefficient. 
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FIG. 11: Cumulative distribution of quadrangle coefficient. 
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FIG. 12: Correlation between triangle coefficient kt and degree, where kt is the average over nodes 
with the same degree. 
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FIG. 13: Correlation between quadrangle coefficient k q and degree, where k q is the average over 
nodes with the same degree. 
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FIG. 14: Cumulative distribution of nearest-neighbors average degree. 
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FIG. 15: Correlations between nearest-neighbors average degree k nn and degree, where k nn is the 
average over nodes with the same degree. 
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FIG. 16: Cumulative distribution of betweenness centrality, P CU m(C B ). 
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FIG. 17: Correlations between betweenness centrality C* B and degree, where C* B is the average over 
nodes with the same degree. 
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